Alpha-Divergences in Variational Dropout
نویسندگان
چکیده
We investigate the use of alternative divergences to Kullback-Leibler (KL) in variational inference(VI), based on the Variational Dropout [10]. Stochastic gradient variational Bayes (SGVB) [9] is a general framework for estimating the evidence lower bound (ELBO) in Variational Bayes. In this work, we extend the SGVB estimator with using Alpha-Divergences, which are alternative to divergences to VI’ KL objective. The Gaussian dropout can be seen as a local reparametrization trick of the SGVB objective. We extend the Variational Dropout to use alpha divergences for variational inference. Our results compare α-divergence variational dropout with standard variational dropout with correlated and uncorrelated weight noise. We show that the α-divergence with α → 1 (or KL divergence) is still a good measure for use in variational inference, in spite of the efficient use of Alpha-divergences for Dropout VI [14]. α→ 1 can yield the lowest training error, and optimizes a good lower bound for the evidence lower bound (ELBO) among all values of the parameter α ∈ [0,∞).
منابع مشابه
Dropout Inference in Bayesian Neural Networks with Alpha-divergences
To obtain uncertainty estimates with real-world Bayesian deep learning models, practical inference approximations are needed. Dropout variational inference (VI) for example has been used for machine vision and medical applications, but VI can severely underestimates model uncertainty. Alpha-divergences are alternative divergences to VI’s KL objective, which are able to avoid VI’s uncertainty un...
متن کاملPerturbative Black Box Variational Inference
Black box variational inference (BBVI) with reparameterization gradients triggered the exploration of divergence measures other than the Kullback-Leibler (KL) divergence, such as alpha divergences. In this paper, we view BBVI with generalized divergences as a form of estimating the marginal likelihood via biased importance sampling. The choice of divergence determines a bias-variance trade-off ...
متن کاملVariational Dropout Sparsifies Deep Neural Networks
We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse sol...
متن کاملInformation Dropout: learning optimal representations through noise
We introduce Information Dropout, a generalization of dropout that is motivated by the Information Bottleneck principle and highlights the way in which injecting noise in the activations can help in learning optimal representations of the data. Information Dropout is rooted in information theoretic principles, it includes as special cases several existing dropout methods, like Gaussian Dropout ...
متن کاملDifferentially Private Variational Dropout
Deep neural networks with their large number of parameters are highly flexible learning systems. The high flexibility in such networks brings with some serious problems such as overfitting, and regularization is used to address this problem. A currently popular and effective regularization technique for controlling the overfitting is dropout. Often, large data collections required for neural ne...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.04345 شماره
صفحات -
تاریخ انتشار 2017